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tteuWTY  CLAMtriCATtOM  or  THIS  rMIflkm  Om» 


The  Effect  of  Structured  Contextual  Tones  on 


Psychophysical  Frequency  Discrimination 

Recent  research  has  investigated  the  relative  contribution  of  an 
individual’s  knowledge  of  events  and  the  actual  sensory  input  in  perceptual 
processing.  It  is  generally  agreed  that  in  the  perception  of  speech  there  must 
be  an  interaction  between  these  two  sources  of  information  since  sensory  data 
alone  cannot  be  sufficient  for  the  comprehension  of  fluent  speech.  Our  internal 
knowledge  of  language  and  its  rules  makes  a  significant  contribution  to  what  we 
hear  in  speech  (Marlsen-Wilson  &  Tyler,  1980;  Marlsen-Wilson  &  Welsh,  1978). 
This  interaction  also  has  been  illustrated  in  other  areas.  Knowledge  in  the 
form  of  grammatical  structure  has  been  shown  to  facilitate  the  classification  of 
visual  letter  patterns  (Reber,  1969,  1976;  Reber  &  Allen,  1978;  Reber  &  Lewis, 
1977)  and  complex  nonspeech  patterns  (Deutsch,  1980;  Dewar,  Cuddy  &  Mewhort, 
1977;  Howard  &  Balias,  1980). 

In  the  majority  of  these  studies  the  experimental  task  involves  the 
classification  of  entire  patterns.  Relatively  little  research  has  focused  on 
the  role  of  top-down  and  bottom-up  processing  in  the  resolution  of  individual 
components  of  multi-component  patterns.  This  paper  will  address  the  issue  by 
investigating  how  a  higher  level  pattern  structure  influences  the  ability  to 
discriminate  changes  in  the  frequency  of  a  single  element  of  eleven-component, 
pure-tone  patterns. 

Top  Down  Processing  in  Speech  Perception 

The  most  direct  evidence  for  the  existence  of  top-down  (knowledge-driven) 
and  bottom-up  (data-driven)  processes  is  found  in  the  speech  perception 
literature.  There  is  little  doubt  that  these  two  processes  interact  and,  in 
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fact,  are  dependent.  To  Illustrate  this  point  consider  a  purely  bottom-up  model 
of  speech  perception.  In  such  a  model  speech  perception  can  be  viewed  as  a 
series  of  steps  starting  with  an  auditory  input  and  ending  with  a  mental 
representation  of  what  has  been  said.  Phonemes  are  pulled  from  the  speech 
waveforms  by  feature  extraction  or  a  similar  process.  The  consonants  and  vowels 
are  combined  to  form  syllables;  meanings  are  then  retrieved  from  memory.  The 
important  aspect  of  this  model  of  speech  perception  is  its  unidirectional 
nature.  Inputs  in  the  process  are  only  permitted  from  lower  to  higher 
analytical  levels. 

This  model  of  perception  has  difficulties  from  the  outset.  Spectrograms  of 
speech  waveforms  show  a  lack  of  segmentation  that  is  assumed  in  the  model.  Word 
boundaries  are  slurred  and  often  the  segmentation  expected  between  words 
actually  occurs  within  words.  In  addition,  sounds  at  the  lowest  level  are 
subject  to  contextual  constraints.  The  actual  sound  of  a  consonant  is  dependent 
upon  other  phonemes  in  the  syllable  (Liberman,  Cooper,  Shankweiler,  & 
Studdert-Kennedy,  1967).  In  fact,  silence  can  be  perceived  as  a  consonant. 
Bastian,  Eimas,  and  Liberman  (1961)  show  that  observers  interpret  the  word 
"sore"  as  "store"  when  a  short  interval  of  silence  is  placed  between  the  "s"  and 
"o." 

Several  studies  have  shown  that  contextual  information  at  higher  analytical 
levels  can  also  influence  perception  of  speech.  Harslen-Wilson  (1975)  employed 
a  shadowing  task  to  investigate  perception  of  mispronounced  words.  Observers 
were  asked  to  repeat  passages  varying  in  syntactic  and  semantic  congruence.  In 
congruent  passages  the  observers  restored  mispronounced  words  to  their  expected 
forms.  This  suggests  that  higher  order  contextual  information  influenced  lower 
level  processes.  To  investigate  the  possibility  that  the  mispronunciations  were 
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perceived  and  then  corrected,  Marslen-Wilson  and  Welsh  (1978)  recorded  observer 
response  times.  The  latencies  for  correct  and  mispronounced  words  were  not 
different,  suggesting  that  perceptual  and  not  conscious  restoration  processes 
were  responsible  for  the  corrections. 

Perhaps  the  most  striking  example  of  the  power  of  contextual  information  is 
shown  in  the  research  of  Warren  and  his  colleagues.  Warren  deleted  a  phoneme 
from  a  word  and  replaced  it  with  the  nonspeech  sound  of  a  "cough"  or  a  pure  tone 
(Warren,  1970).  After  listening  to  the  word  in  a  sentence  most  individuals 
reported  that  that  no  mispronunciations  were  present,  even  though  they  were 
asked  to  listen  for  and  localize  such  occurrences. 

Using  this  "phoneme  restoration"  effect  Warren  and  Warren  (1970) 
demonstrate  that  a  partially  ambiguous  word  can  be  interpreted  in  a  variety  of 
ways,  depending  on  the  context.  They  presented  the  sentence  "It  was  found  that 
the  *eel  was  on  the  "  to  observers  with  one  of  four  words  spliced  onto  the  end: 
axle,  shoe,  table,  or  orange.  Depending  on  the  final  word  the  same  sound 
pattern  (*)  was  perceived  as  "wh,"  "h,"  "m,"  or  "p." 

These  research  findings  strongly  support  the  belief  that  both  top-down  and 
bottom-up  processes  exist  in  the  perception  of  fluent  speech.  The  bottom-up 
model  of  speech  perception  presented  earlier  must  be  modified  to  include  the 
influence  ■  of  higher  order  processes.  Our  knowledge  of  the  rules  and 
expectancies  of  language  influence  what  we  actually  perceive  by  guiding  "the 
construction  of  the  representation  of  the  speech  input"  (Glass,  Holyoak,  A 
Santa,  1979.  p.  51 ). 

Some  of  the  top-down  processes  in  speech  perception  can  be  defined.  These 
are  the  phonologic,  prosodic,  semantic  and  syntactic  rules  of  speech,  as  well  as 
contextual  cues  which  form  expectancies  in  a  conversation.  Researchers  in  the 
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artificial  intelligence  area  of  automatic  speech  understanding  have  renamed 
these  rules  and  expectancies  "knowledge  sources"  (Reddy,  1976).  In  the  HEARSAY 
II  speech  understanding  system  these  knowledge  sources  can  be  added  or  deleted, 
thereby  increasing  or  decreasing  system  performance  levels  (Reddy,  Erman,  & 
Neely,  1973;  Lesser,  Fennell,  Erman,  A  Reddy,  1975).  This  conceptualization  of 
available  knowledge  is  quite  useful  in  discussing  top-down  and  bottom-up 
processing  and  will  be  used  throughout  the  remainder  of  the  paper. 

Top-down  Processing  In  Classification  Of  Entire  Patterns 

Although  not  as  conclusive  as  the  case  for  speech  perception,  there  is  also 
evidence  pointing  to  the  existence  of  top-down  and  bottom-up  processing  in  other 
modes  of  perception.  Reber  (Reber,  1969,  1976;  Reber  &  Allen,  1978;  Reber  & 
Lewis,  1977)  has  shown  that  top-down  processes  are  evident  in  visual 
classification  tasks.  Listeners  were  asked  to  classify  letter  patterns  as 
either  grammatical  or  non-grammatical .  The  grammatical  patterns  were  generated 
by  a  transition  code  which  determined  the  sequential  order  of  letters  in  the 
letter  strings.  The  classification  of  grammatical  patterns  was  consistently 
better  than  the  classification  of  nongrammatlcal  ones.  This  suggests  that 
observers  were  able  to  abstract  knowledge  from  the  structure  imposed  by  the 
transition  code  and  use  it  in  a  top-down  process  for  improved  classification. 

Reber  and  his  colleagues  have  coined  the  phrase  "implicit  learning"  for 
this  process,  since  there  are  distinctions  between  it  and  normal  learning. 
First,  the  rules  must  be  abstract  or  complex.  Second,  the  learning  takes  place 
outside  the  consciousness  of  the  observer ;  an  attempt  to  consciously  formulate 
the  rules  impedes  performance.  This  does  not  mean  that  implicitly  learned 
knowledge  sources  cannot  be  utilized  for  performing  tasks,  rather  it  means  that 
individuals  have  difficulty  explaining  the  exact  rules  constituting  the 
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knowledge  source.  Reber  et  al.  (1978,  p.  191 )  describe  implicit  learning  as 
"an  hypothesized  abstraction  process,  a  nonconscious,  nonrational,  automatic 
process  whereby  the  structural  nature  of  the  stimulus  environment  is  mapped  into 
the  mind  of  the  attentive  subject." 

Using  a  similar  finite-state  transition  code  Howard  and  Balias  (1980) 
extended  Reber 's  work  by  demonstrating  that  top-down  processes  are  also  present 
in  perception  of  non-speech  sounds.  Auditory  patterns  were  constructed  with 
either  pure  tones  or  complex  "environmental"  sounds  such  as  a  pipe  clang  or 
steam  hiss.  Patterns  with  syntactic  structure  (transition  code)  were  classified 
better  than  those  without  structure,  especially  when  the  patterns  were  composed 
of  pure  tones.  For  complex  tones,  syntactic  and  semantic  structure  (thematic 
instructions)  produced  an  interaction  effect:  when  the  sounds  were 
interpretable  (occurring  in  an  order  consistent  with  semantic  associations) 
performance  was  improved  by  thematic  instructions. 

These  findings  indicate  that  top-down  processing  can  occur  in  the 
classification  of  nonspeech  sounds.  The  Reber  studies,  as  well  as  that  of 
Howard  &  Balias,  indicate  that  top-down  processing  occurs  when  an  entire  pattern 
is  the  experimental  focus.  Can  top-down  processing  occur  in  simple,  nonspeech 
patterns  when  the  experimental  task  concerns  an  individual  element  of  the 
pattern,  rather  than  the  pattern  as  a  whole?  The  literature  suggests  that  this 
may  be  the  case. 

Auditory  Stream  Segregation 

An  interesting  and  pervasive  auditory  phenomenon  occurs  in  continuously  and 
rapidly  presented  tones  of  sufficient  frequency  separation.  The  tones  separate 
into  similar  "streams"  or  "channels"  which  are  perceived  as  simultaneous  and 
independent,  rather  than  temporally  alternating.  An  analogy  from  visual 
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perception  is  the  Meeker  cube  which  can  be  seen  from  two  possible  perspectives, 
but  only  one  at  a  time.  Similarly,  observers  listening  to  a  streamed  auditory 
pattern  can  focus  their  attention  on  individual  streams  but  not  on  the 
relationships  of  the  pattern  as  a  whole.  This  phenomenon  has  been  alternately 
referred  to  as  the  trill  threshold  (Miller  &  Heise,  1950),  rhythmic  fission 
(Dowling,  1978),  or  auditory  stream  segregation  (Bregman  &  Campbell,  1971). 
Miller  and  Heise  (1950)  initially  illustrated  the  phenomenon  by  alternating  two 
tones  which  gradually  separated  in  frequency.  At  a  separation  of  approximately 
three  semitones  the  tones  were  perceived  as  simultaneous  and  independent  rather 
than  al ternati ng . 

Bregman  and  Campbell  (1971)  investigated  these  effects  in  tonal  patterns 
composed  of  six  elements.  Alternate  tones  were  selected  from  two  frequency 
ranges  and  the  resulting  pattern  was  repeated  continuously.  Observers  were 
asked  to  report  the  order  of  tones  within  the  pattern,  being  given  as  much  time 
as  needed  to  reach  a  decision.  Results  indicate  that  reports  of  tone  order 
within  frequency  ranges  (streams)  were  accurate  while  those  across  ranges  were 
not.  In  addition,  observers  tended  to  report  tonal  order  in  terms  of  streams 
and  not  the  pattern  as  a  whole.  Bregman  and  Campbell  stressed  that  the 
formation  of  streams  results  from  an  organizational  process  of  perception  not 
directly  attributable  to  physical  properties  of  the  stimulus. 

Subsequent  research  has  expanded  this  theme.  Bregman  (1978)  interprets 
this  auditory  streaming  effect  as  a  mechanism  which  assists  in  the  sorting  of 
the  complex  waveforms  of  audition.  Although  we  may  actually  hear  a 
conglomeration  of  sounds  we  perceptually  sort  or  stream  each  component  to  its 
respective  source.  Howard  and  Balias  (1980)  note  that  Bregman' s  work  has 
concentrated  on  identifying  the  rules  which  govern  auditory  streaming  at  a  basic 
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level.  Using  complex  stimuli  in  the  experiment  previously  discussed  they  have 
shown  that  specific  factors  can  also  influence  streaming.  These  factors  include 
"the  listener's  skills,  intentions,  and  knowledge  of  the  stimuli"  (Howard  & 
Balias,  1980,  p.  432). 

Effects  of  Structure  on  Pattern  Perception 

One  aspect  of  auditory  patterns  that  might  be  considered  a  higher  level 
factor  is  structure.  Guilford  (Guilford  &  Hilton,  1933;  Guilford  &  Nelson, 
1936)  suggested  that  a  rising  or  falling  tone  caused  corresponding  perceptual 
changes  in  surrounding  tones.  Heise  and  Miller  (1951)  empirically  tested  this 
observation  by  employing  the  auditory  streaming  effect.  Patterns  were  organized 
according  to  pitch  in  rising,  falling,  v-shaped  (that  is,  falling,  then  rising 
in  pitch),  or  inverted  v-shaped  (rising,  then  falling)  forms.  The  experimental 
task  was  <-o  increase  or  decrease  the  frequency  of  a  single  element  until  it 
perceptually  separated  from  the  pattern.  The  overall  tonal  configuration  was 
found  to  influence  perception,  expecially  in  the  v-shaped  or  inverted  v-shaped 
forms.  In  these  patterns  frequency  alterations  which  sharpened  the  v's  were 
perceived  more  easily  than  those  which  flattened  the  V's. 

Idson  and  Massaro  (1976)  also  employed  auditory  streaming,  along  with 
backward  recognition  masking,  to  show  that  pattern  structure  can  influence 
perception.  The  backward  recognition  masking  effect  (Massaro,  1970)  involves 
the  presentation  of  a  target  tone  followed  by  a  masking  tone.  The  ability  to 
recognize  the  first  tone  is  correlated  with  the  intertone  interval:  performance 
increases  with  duration  up  to  approximately  250  ms.  Idson  and  Massaro 
hypothesized  that  structure-induced  auditory  stream  segregation  in  a  pattern 
might  influence  backward  recognition  masking  since  elements  within  a  stream  can 
be  related  to  each  other  while  across-stream  elements  cannot  be  related  (Bregman 
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&  Campbell,  1971),  As  predicted.  Identification  in  patterns  containing 
within-stream  masks  was  lower  than  identification  in  those  containing  masks 
drawn  from  different  octaves.  When  a  single  tone  (instead  of  a  pattern)  and  a 
mask  from  a  different  octave  were  presented,  the  backward  masking  effect  was 
elicited . 

Musical  structure  has  been  shown  to  influence  performance  in  a  variety  of 
tasks  (Attneave  &  Olson,  1971;  Bartlett  &  Dowling,  1980;  Cuddy,  Cohen  A 
Mewhort,  1981;  Cuddy,  Cohen,  &  Miller,  1979;  Idson  A  Massaro,  1976;  Krumhansl 
A  Shepard,  1979)*  Musical  structure  includes  the  notion  of  scale  and  tonal 
relationship  described  in  terms  such  as  octave,  tone  chroma,  diatonic,  tonic, 
cadence,  contour,  and  excursion  (Deutsch,  1977;  Dowling,  1978).  Music  theory 
suggests  that  the  perception  of  melodies  involves  more  than  physically 
describable  properties,  such  as  cultural-bound  schemata  (Dowling,  1978). 

Dewar,  Cuddy,  and  Mewhort  (1977)  investigated  the  effects  of  pattern 
structure  and  contextual  tones.  Musical  patterns  containing  7  of  13  tones 
specific  to  an  octave  were  chosen  to  constitute  musical  structure.  The  random 
assignment  of  seven  tones  in  the  octave  constituted  non-musical  structure. 
Observers  listened  to  a  standard  sequence  and  then  chose  between  two 
alternatives,  one  of  which  was  exactly  the  same.  In  the  no  context  condition 
two  individual  tones  were  presented;  in  the  full  context  condition  two  complete 
sequences  were  presented.  The  effects  for  both  structure  and  context  were 
highly  significant,  in  favor  of  full  context  and  musical  structure. 

Cuddy,  Cohen,  and  Miller  (1979)  further  delineate  the  role  of  structure  in 
a  melody  transformation  experiment.  This  task  involves  the  relocation  of  an 
entire  sequence  to  a  higher  or  lower  octave.  Once  relocated,  a  "target"  trial 
consists  of  the  relative  change  in  frequency  of  one  element  of  the  pattern.  The 
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amount  of  musical  structure  was  varied  and  performance  was  found  to  be  directly 
related  to  the  amount  of  musical  structure  present.  Cuddy,  Cohen  and  Mewhort 
(1981)  used  identical  experimental  procedures  to  extend  the  Cuddy  et  al.  (1979) 
findings.  Various  amounts  of  sequence  structure  including  harmonic  structure 
and  simple  or  complex  contour  were  investigated.  Again,  performance  wa3  largely 
determined  by  the  amount  of  musical  structure  present. 

The  experiments  discussed  in  this  section  strongly  suggest  that  contextual 
structure  can  provide  a  knowledge  source  for  the  abstraction  of  information.  As 
we  have  seen,  this  information  can  be  used  in  a  variety  of  tasks  to  improve 
performance.  One  paradigm  in  which  the  effects  of  contextual  structure  are 
considered  deleterious  is  that  of  Watson  and  his  colleagues  (Watson,  Wroton, 
Kelly,  &  Benbassat,  1975;  Watson,  Kelly,  &  Wroton,  1976;  Speigel  &  Watson, 
1981).  They  state  that  "it  appears  to  be  the  variation  of  the  contextual 
pattern  in  which  the  signal  is  embedded  which  is  associated  with  severe 
degradation  of  performance"  (Watson  et  al.,  1976,  p.  1184).  However,  a  deeper 
understanding  of  the  paradigm  is  necessary  to  appreciate  their  use  of  the  term 
"contextual  pattern."  Watson's  paradigm  employs  highly  trained  (task-specific) 
observers  who  resolve  patterns  composed  of  extremely  short-duration  tones. 
Typically,  an  observer  in  such  an  experiment  hears  a  small  number  of  patterns  on 
a  large  number  of  trials.  A  change  in  "contextual  pattern"  in  this  sense  is 
simply  how  well  an  entire  pattern  is  learned,  rather  than  the  structure  of  the 
elements  which  compose  it. 

Using  these  highly  overlearned  patterns,  Watson  and  his  colleagues  have 
found  that  the  overall  uncertainty  of  how  the  "target  tone"  (i.e.,  tone  subject 
to  frequency  change)  can  vary  influences  the  listeners'  ability  to  detect 
frequency  change.  They  identify  four  factors  which  influence  uncertainty  in 
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single-tone  frequency  discrimination:  1)  the  number  of  patterns  heard,  2)  the 
number  of  target  tone  frequencies,  3)  the  number  of  target  tone  positions,  and 
4)  the  temporal  position  of  the  target  tone.  Of  the  four  uncertainty  variables, 
Watson  argues  that  the  number  of  patterns  presented  is  the  most  influential 
factor  in  frequency  discrimination. 

The  Structure  of  Individual  Elements  as  a  Knowledge  Source 

However,  structure  as  individual  elements  of  a  pattern  rather  than  a 
"contextual  pattern"  may  also  influence  performance.  The  previously  discussed 
literature  suggests  that  this  structure  can  provide  a  source  of  knowledge  for 
top-down  processing.  It  is  hypothesized  that  providing  structure  to  tones 
surrounding  the  target  tone  will  provide  such  a  knowledge  source  and  improve 
performance  in  the  Watson  paradigm,  three  conditions  will  be  arranged  to  test 
this  hypothesis:  a  constant,  a  random,  and  a  structured  condition.  In  the 
constant  condition  only  one  pattern  will  be  played  to  listeners,  while  in  the 
random  and  structured  conditions  12  patterns  will  be  played.  It  is  predicted 
that  the  constant  condition  will  result  in  superior  performance  since  there  are 
fewer  patterns  and  consequently  less  uncertainty,  while  the  other  three 
variables  Watson  cited  are  held  to  a  minimum.  For  the  structured  and  random 
conditions  all  four  variables  will  be  held  constant.  The  only  difference  will 
be  that  the  tones  surrounding  the  tone  subject  to  change  will  contain  an 
inverted-v  structure  (rising  and  then  falling  in  pitch)  similar  to  the  structure 
employed  in  the  studies  of  Miller  and  Heise  (1951).  It  is  predicted  that 
listeners  will  employ  top-down  processing  in  the  structured  condition  to  improve 
performance  relative  to  the  random  condition. 

The  second  experimental  hypothesis  concerns  the  effect  of  musical  training. 
In  experiments  involving  musically  structured  tones  observers  with  musical 
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training  typically  perform  better  than  untrained  observers.  For  detecting 
correct  melody  transpositions  the  effect  is  quite  pervasive  (Attneave  &  Olson, 


i97i; 

Bartlett  &  Dowling, 

1980;  Cuddy,  1971; 

Cuddy 

&  Cohen,  1976; 

Cuddy, 

Cohen, 

&  Mewhort,  1981; 

Krumhansl  &  Shepard, 

1979). 

As  an  example, 

Cuddy  and 

Cohen 

(1976)  asked  both 

types  of  observers  to 

discriminate 

between 

transpositions  in  which  one  tone  was  altered.  Trained  observers  performed  near 
the  9016  discrimination  level  while  untrained  observers  were  not  much  above 
chance  (60%). 

The  effect  of  musical  training  in  other  types  of  frequency  discrimination 
is  not  so  well  defined.  In  absolute  pitch  judgement  performance  is  correlated 
with  musical  training  (Cuddy,  1968;  Cuddy  &  Cohen,  1976).  There  is  also 
evidence  that  frequency  discrimination  in  patterns  is  affected  by  musical 
training.  Dewar,  Cuddy,  and  Mewhort  (1977)  found  that  in  a  two  alternative 
forced-choice  procedure  musical  training  facilitated  the  ability  to  detect 
frequency  changes.  Using  essentially  the  same  paradigm,  it  is  hypothesized  that 
musical  training  will  improve  performance  in  the  present  study. 

The  third  experimental  prediction  involves  the  factor  of  change  magnitude. 
Since  the  observers  in  the  present  study  will  not  have  as  much  training  as  those 
of  Watson's,  the  size  changes  will  be  increased  by  the  order  of  one  magnitude. 
It  is  predicted  that  observer  performance  will  be  directly  related  to  absolute 
frequency  change.  No  prediction  is  made  on  the  basis  of  positive  or  negative 
changes. 

Method 

Participants 

Twelve  paid  individuals  served  as  listeners  in  the  experiment.  Musically 
trained  listeners  were  recruited  from  the  Catholic  University  School  of  Music. 
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No  attempt  was  made  to  segregate  the  various  disciplines  within  the  field  (voice 
majors,  composers,  etc.)  or  level  of  training.  Nonmusically  trained  listeners 
either  responded  to  advertisements  or  were  recruited  from  a  subject  pool.  Two 
musically  and  two  nonmusically  trained  listeners  were  randomly  assigned  to  each 
of  the  three  conditions. 

Stimuli 

The  patterns  were  composed  of  eleven  tones  played  in  succession  with  an 
intertone  interval  of  5  ms.  The  duration  of  each  tone  was  40  ms  with  a  2.5  ms 
rise/fall  time.  Fourteen  different  patterns  were  constructed  by  randomly 
choosing  without  replacement  from  tones  of  500,  565,  638,  721,  815,  920,  1040, 
1175,  1327,  1500,  and  1673  Hz  with  the  qualification  that  the  1673  Hz  target 
tone  and  the  tone  occupying  the  sixth  position  were  switched  in  each  pattern. 
This  allowed  all  patterns  to  have  the  target  tone  in  the  same  temporal  position. 
A  pattern  for  the  training  session  and  a  pattern  for  the  constant  condition  were 
choser.  randomly,  with  the  remaining  12  forming  the  basis  for  both  the  random  and 
structured  conditions.  To  construct  a  structured  pattern,  the  tones  prior  to 
the  target  tone  in  a  random  pattern  were  rearranged  to  ascend  in  frequency  and 
the  later  tones  were  rearranged  to  descend  in  freq.*  ly. 

To  generate  a  "different"  pattern,  the  target  tone  was  replaced  with  a  tone 
having  one  of  four  frequency  changes  relative  to  the  1673  Hz:  +32,  -32,  -48,  or 
-64  Hz  (see  footnote).  The  patterns  were  presented  at  an  approximate  listening 
level  of  76  dB  SPL.  Both  the  presentation  of  the  same  and  different  trials 
along  with  the  size  and  direction  of  the  change  in  frequency  were  varied 
randomly  in  experimental  trials. 

The  training  trials  employed  a  variation  of  the  "simulated  method  of 
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adjustment"  used  by  Spiegel  and  Watson.  The  same  and  different  trials  were 
presented  alternately.  The  different  trials  were  constructed  to  illustrate 
explicitly  what  constituted  a  different  trial.  Eighteen  differences  ranging 
from  350  Hz  above  to  350  Hz  below  the  target  tone  were  used  including  the  four 
differences  to  be  used  in  the  experiment.  The  presentation  of  differences  was 
systematic,  going  from  the  largest  to  the  smallest  difference  above  the  target 
tone,  then  proceeding  from  the  smallest  to  the  largest  difference  below. 

Apparatus 

All  experimental  events  were  controlled  by  a  general  purpose  laboratory 
computer  (Digital  PDP/8e) .  The  tones  were  synthesized  with  the  computer  using 
standard  digital  techniques.  They  were  output  on  a  12-bit  dlgital-to-analog 
converter  at  a  sampling  rate  of  12.5  kHz,  low-pass  filtered  at  3  kHz  (Khron-Hite 
Model  3550),  attenuated,  and  presented  binaurally  over  matched  Telephonies 
TDH-49  headphones  with  MX-41/AR  cushions.  Prompts  were  presented  by  a  video 
monitor  in  the  test  booth  and  listeners  indicated  their  responses  by  pressing 
buttons  on  a  solid-state  keyboard. 

Procedure 

Listeners  were  tested  individually  in  a  sound-attenuated  booth  for  1  hour 
on  5  consecutive  days.  The  first  session  was  devoted  to  training.  The  second 
session  was  30  min  of  training  followed  by  the  first  experimental  block. 
Sessions  3  through  5  contained  5  min  of  training  and  2  experimental  blocks, 
resulting  in  a  total  of  7  data  blocks  for  the  experiment. 

The  instructions  were  explicit  due  to  the  difficult  nature  of  the  task. 
Before  the  first  training  session  the  listeners  were  informed  that  they  would 
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hear  auditory  patterns  composed  of  short  tones  played  very  quickly  and  that  a 
pair  of  patterns  would  be  presented  together.  They  were  told  that  the  second 
pattern  of  the  pair  might  be  different  and  that  this  difference  would  only  be 
found  in  the  pitch  of  the  sixth  tone.  A  graphic  illustration  of  a  paired 
presentation  was  included.  They  were  informed  that  same  and  different  trials 
would  alternate,  that  the  changes  were  systematic  and  that  their  only  task  was 
to  listen  carefully  to  the  patterns  and  how  they  were  different.  Before  the 
first  experimental  block  was  presented  another  set  of  instructions  was  given 
Indicating  that  same  and  different  trials  would  be  presented  randomly  and  that 
it  was  their  task  to  decide  whether  the  patterns  were  the  same  or  different. 
They  were  shown  the  possible  responses  and  asked  to  let  a  numeric  choice  reflect 
their  confidence  in  the  decision.  Any  questions  were  answered.  The 
instructions  were  identical  for  all  groups. 

A  trial  began  when  the  word  "listen"  appeared  on  the  screen.  The  first 
pattern  and  the  second  pattern  were  accompanied  by  the  prompts  "pattern  1"  and 
"pattern  2"  as  they  were  played.  These  6  choices  immediately  appeared  on  the 
screen  following  the  prompts: 

Isdefinitely  different 
2=probably  different 
3=possibly  different 
4=possibly  the  same 
5=probably  the  same 
6=definitely  the  same 

After  the  listeners  indicated  their  choice  by  pressing  one  of  six  buttons 
on  the  keyboard,  immediate  visual  feedback  was  provided.  After  a  1.5  s 


intertrial  interval  the  next  trial  began.  The  training  program  followed  the 
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same  procedure  except  that  instead  of  response  choices  immediate  visual  feedback 
was  provided  after  the  patterns  had  been  played.  Ho  listener  response  was 
required  and  a  new  trial  began  automatically.  In  the  training  session  432 
paired  pattern  trials  were  given.  In  an  experimental  block  each  listener 
received  144  trials  per  session.  Thus,  each  listener  heard  1008  total  test 
trials.  Opportunities  for  two  breaks  were  provided  in  each  experimental  session 
and  in  the  hour  long  training  session. 

Results  and  Discussion 

Listener  confidence  ratings  were  used  to  compute  receiver-operating 
characteristics  (ROC’s).  The  area  under  a  ROC  yields  a  response-bias-free  index 
of  performance  (Green  &  Swets,  1966).  To  assess  the  effect  of  the  change  in 
frequency  differences,  listener  responses  were  collapsed  across  trials  and  ROC's 
were  computed.  A  Change  X  Training  X  Group  ANOVA  was  performed.  The  main 
effect  of  Change  was  highly  significant,  F(3,18)=  6.58,  £<.005,  while  the  main 
effect  of  Group  was  found  to  be  not  significant,  F(2,6)=3»87,  £<.10.  Training 
had  no  effect,  F( 1 ,6)=1 .05.  No  interactions  were  found  to  be  significant: 
Training  X  Group,  F(2,6)=2.14,  £<.20,  Change  X  Group,  F(6, 18)=1 .98,  £<.20, 
Change  X  Training,  F(3, 18)=1 . 45,  and  Change  X  Training  X  Group,  F(6, 18)=1 . 32. 

In  order  to  assess  the  effects  of  experience,  listener  responses  were 
collapsed  within  experimental  blocks  and  the  corresponding  ROC  areas  were 
computed.  A  Block  X  Training  X  Group  ANOVA  was  then  performed.  The  main  effect 
of  Group,  F(2,6)=3.47,  £<.10,  was  not  significant  while  Training,  F(1,6)s.68, 
and  Block,  F(6, 36)®1 .51 ,  had  no  effect.  No  interactions  were  significant: 
Training  X  Group,  F(2,6)=2.04,  Block  X  Training,  F(6,36)=.38,  and  Block  X 
Training  X  Group,  F(12,36)s.40. 

The  ROC  areas  were  averaged  across  listeners  within  the  three  experimental 
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groups  in  each  analysis  and  plotted.  The  Block  X  Group  data  are  presented  in 
Figure  1;  the  Change  X  Group  data  are  presented  in  Figure  2.  Both  graphs 
reveal  apparently  consistent  differences  between  groups  which  are  only 
marginally  indicated  by  the  statistics  employed.  To  investigate  this 
discrepancy  further  the  homogeneity  of  variance  was  tested.  The  Group  variances 
in  each  analysis  were  subjected  to  the  Hartley's  Fmax  test.  For  the  Group  X 
Block  variances.  Fmax  =  3. 24,  £<.05;  for  the  Change  X  Group  variances,  Fmax  = 
22.73,  £<.01.  In  both  instances  an  underlying  assumption  of  ANOVA  was  violated 
for  the  main  effect  of  Group. 

To  circumvent  this  problem  a  nonparametric  Kruskal-Wallis  ANOVA  by  ranks 
was  performed.  For  each  Individual  the  results  were  averaged  across  blocks  for 
an  experiment-wide  estimation  of  performance.  (An  averaging  across  the  Change  X 
Group  data  resulted  in  the  same  rankings.)  These  scores  were  then  placed  in  rank 
order.  The  analysis  indicated  that  the  rankings  were  significantly  different  at 
an  Alpha  level  of  .05,  H=6.75.  This  analysis  allows  the  conclusion  that  the 
structured  group  performed  significantly  better  than  the  random  group  as 
predicted . 

Although  this  analysis  does  not  permit  claims  of  significant  differences 
between  the  structured  and  constant  groups,  inspection  of  the  graphs  and 
rankings  indicate  that  some  difference  does  exist.  This  is  surprising  since 
Watson  et  al.  (1976,  p.  1183)  found  that  "the  largest  change  in  performance 
associated  with  a  single  factor  of  stimulus  uncertainty  is  that  related  to  the 
number  of  patterns,  or  different  tonal  sequences,  in  the  stimulus  catalogue.” 
Since  the  structured  condition  had  12  patterns  while  the  constant  condition  had 
only  one,  these  results  are  unexpected.  In  view  of  Watson’s  findings  it  was 
predicted  that  the  observers  in  the  constant  condition  would  outperform 
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observers  in  the  structured  condition. 

The  effect  of  musical  training  was  virtually  nonexistent.  There  are 
several  possible  explanations.  First,  the  criteria  for  musical  training  was  not 
as  stringent  in  this  study  as  in  other  studies.  For  example.  Cuddy,  Cohen  and 
Mewhort  (1981)  separated  trained  and  highly  trained  listeners  on  the  basis  of 
specific  conservatory  ratings  and  the  number  of  instruments  practiced  on  a 
regular  basis.  In  the  present  study  the  musical  training  group  included  voice 
majors,  and  a  composer  as  well  as  those  who  practiced  instruments.  Second,  the 
experimental  task  differs  from  the  majority  of  studies  indicating  effects  due  to 
musical  training.  These  experiments  measure  performance  on  relative  frequency 
changes  through  the  melody  transposition  task. 

Even  in  studies  reporting  musical  effects  with  similar  tasks  there  are 
differences  in  the  nature  of  the  stimuli.  Dewar,  Cuddy,  and  Mewhort  (1977) 
report  training  effects  in  a  study  differing  essentially  in  only  three  aspects: 
tone  duration,  type  of  tone,  and  definition  of  structure.  The  tones  were 
generated  on  a  piano  and  taped,  whereas  the  present  study  employed  pure  tones. 
Also,  the  tones  were  much  longer  in  that  study,  lasting  667  ms  as  opposed  to  40 
ms.  Finally,  the  tones  were  chosen  with  musical  structuring  constraints.  The 
absence  of  timbre  and  musical  structure,  as  well  as  the  short-duration  of  the 
tones  in  the  current  experiment  certainly  contribute  to  the  absence  of  a  musical 
training  effect. 

The  primary  purpose  of  the  present  experiment  was  to  examine  the  role  of 
top-down  processing  and  knowledge  sources  in  perception.  The  overall 
performance  of  the  structured  group  indicates  that  these  observers  were  able  to 
extract  information  from  the  structure  and  use  it  in  top-down  processing.  Some 
specific  aspects  of  performance  also  suggest  that  a  knowledge  source  existed  for 
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these  observers. 

The  Change  X  Group  graph  can  be  interpreted  in  such  a  manner.  For  the 
structured  group  the  three  negative  changes  have  performance  levels  which 
reflect  their  absolute  change.  However,  the  positive  change  has  a  greatly 
reduced  performance  level.  It  should  be  remembered  that  the  one  positive  change 
represents  twice  the  number  of  observations  as  any  negative  change,  making  the 
overall  ratio  of  negative  changes  to  positive  changes  in  the  experiment  3‘.2.  In 
addition,  two  of  the  three  negative  changes  were  of  a  larger  magnitude  than  the 
positive  changes.  From  the  difference  in  number  and  magnitude  one  can  conclude 
that  the  negative  changes  contained  more  information  than  the  positive  changes. 
From  the  graph  it  is  obvious  that  observers  in  the  structured  condition  did  very 
well  on  the  negative  changes.  This  suggests  that  these  observers  were  able  to 
focus  on  the  aspects  of  change  most  likely  to  maximize  performance,  presumably 
through  a  knowledge  source  provided  by  pattern  structure. 

These  specific  aspects  of  performance,  along  with  the  overall  superiority, 
indicate  that  observers  in  the  structured  condition  were  able  to  abstract 
knowledge  from  the  structure  of  the  patterns,  thereby  gaining  information  and 
improving  performance.  Before  speculating  about  the  exact  nature  of  the 
knowledge  source  though,  Watson's  findings  and  the  experimental  task  need  to  be 
discussed . 

Using  the  same  paradigm  with  highly  trained  (i.e.,  greater  than  15  hrs) 
observers  Watson  has  found  that  the  overall  "uncertainty"  of  how  the  target  tone 
can  vary  influences  the  listener's  ability  to  resolve  patterns.  As  indicated 
above,  Watson  et  al.  (1976)  identified  four  factors  which  determine  stimulus 
"uncertainty":  1)  the  number  of  patterns,  2)  the  number  of  target  tone 
positions,  3)  the  number  of  target  tone  frequencies,  and  4)  the  temporal 
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position  of  the  target  tone.  When  these  "uncertainty"  variables  were  held  to  a 
minimum  several  perceptual  changes  occurred:  a)  the  sequence  separated  into  2-3 
segments  composed  of  elements  close  in  frequency,  similar  to  Bregman's  "auditory 
streaming"  effect;  b)  the  stream  containing  the  target  tone,  and  especially  the 
target  tone  itself  increased  in  salience;  c)  the  loudness  of  the  target  tone 
and  its  stream  likewise  increased. 

These  perceptual  changes  occur  only  when  observers  can  focus  their 
attention  (i.e.,  when  "uncertainty"  is  at  a  minimum).  Even  with  an  auditory 
pattern  as  short  as  the  one  employed  in  this  paradigm  listeners  can  focus  their 
attention  throughout  its  length.  When  observers  are  forced  to  search  the  entire 
pattern  for  the  tone  subject  to  change,  their  performance  suffers.  However, 
when  observers  know  where  they  should  be  listening,  performance  is  only  slightly 
less  than  expected  for  Isolated  tones  of  equal  duration  (Watson  et  al.,  1976). 

For  the  structured  and  random  conditions  the  four  factors  constituting 
stimulus  uncertainty  were  held  constant.  The  only  difference  was  in  the 
structuring  of  the  elements  which  composed  a  sequence  in  the  structured 
condition.  This  structuring  facilitated  the  focus  of  attention;  the  difference 
between  groups  was  due  to  a  difference  in  ease  of  attentional  focusing. 

This  focusing  can  be  attributed  to  several  aspects  of  the  stimuli.  For  the 
structured  condition  the  tones  prior  to  the  target  tone  were  ascending  in 
frequency  and  later  tones  were  descending  in  frequency.  Knowing  that  the  target 
tone  was  sandwiched  between  these  two  runs  helped  observers  locate  the  exact 
location  of  change.  This  is  an  obvious  consequence  of  the  rules  applied  to 
provide  structure. 


A  not-so-obvlous  consequence  has  to  do  with  the  auditory  streaming  effect. 
Watson,  Kelly  and  Wroton  (1976)  indicate  that  the  tonal  patterns  separated  into 
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2-3  streams  during  the  course  of  experimentation  and  that  the  stream  containing 
the  target  tone  increased  in  salience.  In  the  present  experiment,  the  rules 
which  provided  the  structure  also  placed  the  elements  forming  the  target  stream 
consistently  in  the  middle  of  the  pattern.  Thus  observers  could  utilize  both 
the  overall  configuration  of  rising  and  falling  frequency  and  the  centralized 
streaming  to  focus  attention.  This  constitutes  the  knowledge  source  employed  in 
top-down  processing. 

Several  qualifications  need  to  be  made  in  comparing  the  results  of  this 
study  to  Watson’s  findings.  Although  the  stimuli  were  altered  only  by  the 
addition  of  one  tone  and  slightly  larger  differences,  the  training  was 
substantially  different.  Watson's  observers  were  trained  for  longer  periods  of 
time,  typically  a  minimum  of  15  hrs  before  any  data  were  collected.  Although 
the  stimulus  uncertainty  was  reduced  to  a  minimal  level  and  the  training  program 
was  designed  to  facilitate  frequency  discrimination  skills,  observers  in  the 
present  study  had  only  one  and  one-half  hrs  of  training  before  data  were 
collected.  The  results  may  represent  the  lower  end  of  the  learning  curve  and 
any  comparison  to  Watson's  findings  must  be  interpreted  with  this  in  mind. 

Another  qualification  of  the  results  must  be  made.  This  is  due  to  the  fact 
that  1  of  6  test  trials  designated  as  different  (1  of  12  experimental  trials) 
was  actually  the  same,  and  false  feedback  was  given  each  time  that  that  specific 
different  trial  was  presented.  Although  this  seems  like  a  major  problem, 
several  factors  mitigate  its  impact.  First,  the  nature  of  psychoacoustic 
studies  implies  uncertainty.  In  this  experiment  the  changes  in  frequency  were 
small  and  training  was  necessary  for  listeners  to  detect  these  differences. 
Even  with  this  training  listeners  in  the  random  condition  averaged  only  52. 2f 
across  all  trials:  only  2.2S  above  chance.  Obviously  on  the  11  of  12 
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experimental  trials  with  correct  feedback  these  observers  had  trouble 
discriminating  between  same  and  different  trials  anyway. 

A  look  at  the  data  from  the  remaining  two  groups  suggests  that  if  anything, 
the  incorrect  feedback  reduced  the  size  of  the  experimental  effect.  The  same 
errors  were  present  for  all  the  groups  and  while  the  random  group  remained 
barely  above  chance  tha  structured  and  constant  groups  were  able  to  make 
discriminations.  The  structured  group  averaged  69%  across  all  trials;  the 
constant  group  averaged  60%.  One  can  assume  that  correction  of  the  feedback 
error  would  have  facilitated  performance  in  these  two  groups  relative  to  the 
random  condition. 

Summary  and  Practical  Implications 

The  size  of  frequency  changes  employed  in  the  experiment  was  highly 
significant  and  represents  appropriate  difference  magnitudes  for  the  task. 
Performance  was  generally  related  to  the  absolute  frequency  change,  although  in 
some  instances  a  selective  bias  may  have  occurred.  The  predicted  effect  due  to 
musical  training  did  not  occur .  Contributing  factors  may  be  the  lack  of 
stringent  criteria  for  inclusion  in  the  training  group,  the  extremely  short 
duration  times  of  tones,  and  the  lack  of  musical  structure.  The  main  finding  of 
the  study  is  that  the  structure  of  contextual  tones  can  provide  a  knowledge 
source  to  facilitate  frequency  discrimination.  The  final  group  performance 
rankings  are  (from  best  to  worst):  structured,  constant,  and  random.  This  is 
somewhat  surprising.  The  predicted  difference  between  the  structured  and  random 
conditions  was  significant,  but  the  predicted  ordering  of  the  structured  and 
constant  conditions  was  reversed.  It  appears  that  the  structured  group  was  able 
to  abstract  information  which  influenced  the  ability  to  focus  attention  on 
relevant  aspects  of  the  pattern. 
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In  real  world  auditory  monitoring  tasks  such  as  passive  sonar,  listeners 
must  identify  the  source  of  sounds  and  learn  as  much  as  possible  about  what  the 
target  object  is  doing.  These  sounds  often  occur  in  sequences  or  patterns  in 
which  a  temporal  structure  or  sound  order  can  provide  important  information 
about  the  event  being  monitored.  As  listeners  gain  familiarity  with  the  sound 
patterns  in  a  particular  environment,  they  will  be  able  to  direct 
limited-capacity  attentional  resources  to  the  most  important  aspects  of  the 
pattern.  The  present  experiment  illustrates  that  a  structural  knowledge  source 
can  play  an  important  role  in  attentional  focusing  when  individual  elements  must 
be  resolved  from  within  multicomponent  patterns.  Pattern-related  structural 
information  will  become  even  more  important  when  the  meaningful  pattern 
components  are  difficult  to  resolve  and/or  occur  at  a  very  low  signal-to-noise 
ratio.  In  such  a  situation,  component  ambiguity  can  sometimes  be  resolved  by 
reference  to  the  pattern  structure. 
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FOOTNOTE 


I  The  study  was  originally  designed  to  have  6 
frequency  changes:  -64,  -48,  -32,  +32,  +48,  or  +64  Hz. 
A  programming  error  resulted  in  only  5  changes  being 
played.  One  trial  designated  as  a  "different"  or  a 
target  pattern  was  actually  the  same;  thus  all  listeners 
were  given  false  feedback  once  every  12  trials  on  the 
average.  A  related  error  repeated  one  change  two  times; 
this  error  resulted  in  only  4  magnitudes  of  frequency 
change,  -64,  -48,  -32,  and  32  Hz,  being  presented  and  the 
positive  change  played  twice. 
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